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Abstract 

We consider the task of deriving a key with high HILL entropy (i.e., being computationally 
indistinguishable from a key with high min-entropy) from an unpredictable source. 

Previous to this work, the only known way to transform unpredictability into a key that 
was e indistinguishable from having min-entropy was via pseudorandomness, for example by 
Goldreich-Levin (GL) hardcore bits. This approach has the inherent limitation that from a 
source with k bits of unpredictability entropy one can derive a key of length (and thus HILL 
entropy) at most k — 21og(l/e) bits. In many settings, e.g. when dealing with biometric data, 
such a 21og(l/e) bit entropy loss in not an option. 

Our main technical contribution is a theorem that states that in the high entropy regime, 
unpredictability implies HILL entropy. Goncretely, any variable K with \K\ — d bits of un¬ 
predictability entropy has the same amount of so called metric entropy (against real-valued, 
deterministic distinguishers), which is known to imply the same amount of HILL entropy. The 
loss in circuit size in this argument is exponential in the entropy gap d, and thus this result only 
applies for small d (i.e., where the size of distinguishers considered is exponential in d). 

To overcome the above restriction, we investigate if it’s possible to first “condense” unpre¬ 
dictability entropy and make the entropy gap small. We show that any source with k bits of 
unpredictability can be condensed into a source of length k with fc — 3 bits of unpredictability 
entropy. Our condenser simply “abuses" the GL construction and derives a k bit key from a 
source with k bits of unpredicatibily. The original GL theorem implies nothing when extracting 
that many bits, but we show that in this regime, GL still behaves like a “condenser" for unpre¬ 
dictability. This result comes with two caveats (1) the loss in circuit size is exponential in k 
and (2) we require that the source we start with has no HILL entropy (equivalently, one can 
efficiently check if a guess is correct). We leave it as an intriguing open problem to overcome 
these restrictions or to prove they’re inherent. 


1 Introduction 

Key-derivation considers the following fundamental problem: Given a joint distribution (A, Z) 
where X\Z (which is short for “A conditioned on Z") is guaranteed to have some kind of entropy, 
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derive a “good" key K = h(X, S) from X by means of some efficient key-derivation function h, 
possibly using public randomness S. 

In practice, one often uses a cryptographic hash function like SHA3 as the key derivation function 
h{.) [KralO,DGH'*'04], and then simply assumes that /i(.) behaves like a random oracle [BR93]. 

In this paper we continue the investigation of key-derivation with provable security guarantees, 
where we don’t make any computational assumption about h{.). This problem is fairly well un¬ 
derstood for sources X\Z that have high min-entropy (we’ll formally dehne all the entropy notions 
used in 2 below), or are computationally indistinguishable from having so (in this case, we say 
X\Z has high HILL entropy). In the case where X\Z has k bits of min-entropy, we can either use 
a strong extractor to derive a k — 21oge“^ key that is e-close to uniform, or a condenser to get 
a k bit key which is e-close to a variable with k — log log bits of min-entropy. Using extrac¬ 
tors/condensers like this also works for HILL entropy, except that now we only get computational 
guarantees (pseudorandom/high HILL entropy) on the derived key. 

Often one has to derive a key from a source X\Z which has no HILL entropy at all. The 
weakest assumption we can make on X\Z for any kind of key-derivation to be possible, is that 
X is hard to predict given Z. This has been formalized in [HLROTa] by saying that X\Z has k 
bits of unpredictability entropy, denoted Hs'^'^(X\Z) ^ k, if no circuit of size s can predict X 
given Z with advantage ^ 2~^ (to be more general, we allow an additional parameter 6^0, and 
{X\Z) ^ k holds if (X, Z) is 5-close to some distribution {Y, Z) with {Y\Z) ^ k). We will 
also consider a more restricted notion, where we say that X\Z has k bits of fot-unpredictability 
entropy, denoted Hs^'^'^{X\Z) ^ k, if it has k bits of unpredictability entropy relative to an oracle 
Eq which can be used to verify the correct guess (Eq outputs 1 on input X, and 0 otherwise).^ We’ll 
discuss this notion in more detail below. For now, let us just mention that for the important special 
case where it’s easy to verify if a guess for X is correct (say, because we condition on Z = f{X) 
for some one-way function^ /), the oracle Eq does not help, and thus unpredictability and list- 
unpredictability coincide. The results proven in this paper imply that from a source X\Z with k 
bits of list-unpredictability entropy, it’s possible to extract a k bit key with k — 3 bits of HILL 
entropy 

Proposition 1. Consider a joint distribution {X,Z) over {0,1}"' x {0,1}™' where 


if,*7P(X|Z) ^ k 


( 1 ) 


Let S G {0,1}’^^'“ be uniformly random and K = X^ S G {0,1}*, then the unpredietability entropy 
of K is 

and the HILL entropy of K is 
with? t = s ■ TyTk —T • 

2^^poly{m,n) 


H:';^^^polyirn,n),,iK\Z,S)>k-3 

Hl';^^\{K\Z,S)^k-3 


( 2 ) 

( 3 ) 


^We chose this name as having access to Eq is equivalent to being allowed to output a list of guesses. This is very 
similar to the well known concept of list-decoding. 

^To be precise, this only holds for injective one-way functions. One can generalise list-unpredictability and let Eq 
output 1 on some set X, and the adversary wins if she outputs any X £ X. Our results (in particular Theorem 1) 
also hold for this more general notion, which captures general one-way functions by letting X = f~^{f{X)) be the 
set of all preimages of Z — f{X). 

^We denote with poly{m,n) some fixed polynomial in {n,m), but it can denote different polynomial throughout 
the paper. In particular, the poly here is not the same as in (2) as it hides several extra terms. 
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Proposition 1 follows from two results we prove in this paper. 

First, in Section 4 we prove Theorem 1 which shows how to “abuse” Goldreich-Levin hardcore 
bits by generating a k bit key K = S from a source X\Z with k bits of list-unpredict ability. The 
Goldreich-Levin theorem [GL89] implies nothing about the pseudorandomness of K\(Z,S) when 
extracting that many bits. Instead, we prove that GL is a good “condenser" for unpredictability 
entropy: if X\Z has k bits of list-unpredictability entropy, then K\{Z,S) has k — 3 bits of unpre¬ 
dictability entropy (note that we start with list-unpredictability, but only end up with “normal" 
unpredictability entropy). This result is used in the first step in Proposition 1, showing that (1) 
implies (2). 

Second, in Section 5 we prove our main result. Theorem 2 which states that any source X\Z 
which has |X| — d bits of unpredictability entropy, has the same amount of HILL entropy (technically, 
we show that it implies the same amount of metric entropy against deterministic real-valued distin- 
guishers. This notion implies the same amount of HILL entropy as shown by Barak et al. [BSW03]). 
The security loss in this argument is exponential in the entropy gap d. Thus, if d is very large, this 
argument is useless, but if we first condense unpredictability as just explained, we have a gap of 
only d = 3. This result is used in the second step in Proposition 1, showing that (2) implies (3). In 
the two sections below we discuss two shortcomings of Theorem 1 which we hope can be overcome 
in future work.^ 

1.0.1 On the dependency on 2^ in Theorem 1. 

As outlined above, our first result is Theorem 1, which shows how to condense a source with k 
bits of list-unpredictability into a k bit key having k — 3 bits of unpredictability entropy. The loss 
in circuit size is 2^^poly{m,n), and it’s not clear if the dependency on 2^ is necessary here, or if 
one can replace the dependency on 2^ with a dependency on poly{e~^) at the price of an extra e 
term in the distinguishing advantage. In many settings log(e“^) is in the order of k, in which case 
the above difference is not too important. This is for example the case when considering a k bit 
key for a symmetric primitive like a block-cipher, where one typically assumes the hardness of the 
cipher to be exponential in the key-length (and thus, if we want e to be in the same order, we have 
log(e“^) = 0(A:)). In other settings, k can be superlinear in log(e“^), e.g., if the the high entropy 
string is used to generate an RSA key. 

1.0.2 List vs. normal Unpredictability. 

Our Theorem 1 shows how to condense a source where X\Z has k bits of fot-unpr edict ability 
entropy into a k bit string with k — 3 bits unpredictability entropy. It’s an open question to which 
extent it’s necessary to assume /zst-unpredictability here, maybe “normal" unpredictability is already 
sufficient? Note that list-unpredictability is a lower bound for unpredictability as one always can 
ignore the Eq oracle, i.e., H“'^^{X\Z) ^ Ht“^'^{X\Z), and in general, list-unpredictability can be 
much smaller than unpredictability entropy.^ 

^ After announcing this result at a workshop, we learned that Colin Jia Zheng proved a weaker version of this 
result. Theorem 4.18 in this PhD thesis, which is available via http://dash.harvard.edU/handle/l/11745716 also 
states that k bits of unpredictability imply k bits of HILL entropy. Like in our case, the loss in circuit size in his 
proof is polynomial in , but it’s also exponential In n (the length of X), whereas our loss is only exponential in 
the entropy gap A = n — k. 

®E.g., let X by uniform over {0,1}’’ and Z arbitrary, but independent of X, then for s = exp(n) we have 
H“’'’’{X\Z) = n but Hl'‘'"^{X\Z) = 0 as we can simply invoke Eq on all {0,1}" until X is found. 
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Interestingly, we can derive a k bit key with almost k bits of HILL entropy from a sonrce X\Z 
which k bits nnpredictability entropy {X\Z) ^ k in two extreme cases, namely, if either 

1. if X\Z has basically no HILL entropy (even against small circnits). 

2. or when X\Z has (almost) k bits of (high qnality) HILL entropy. 

In case 1. we observe that if Il^j^^(XlZ) ~ 0 for some t <C s, or eqnivalently, given Z we can 
efficiently distingnish X from any X' ^ X, then the Eq oracle nsed in the definition of list- 
nnpredictability can be efficiently emnlated, which means it’s rednndant, and thns X\Z has the 
same amonnt of list-nnpredictability and nnpredictability entropy, {X\Z) ~ {X\Z) for 

(e',s^) ~ (Cl'S). Thns, we can nse Theorem 1 to derive a k bit key with k — 0(1) bits of HILL 
entropy in this case. In case 2., we can simply use any condenser for min-entropy to get a key with 
HILL entropy k — log log (cf. Figure 2). As condensing almost all the unpredictability entropy 
into HILL entropy is possible in the two extreme cases where X\Z has either no or a lot of HILL 
entropy, it seems conceivable that it’s also possible in all the in-between cases (i.e., without making 
any additional assumptions about X\Z at all). 

1.0.3 GL vs. Condensing. 

Let us stress as this point that, because of the two issues discussed above, our result does not always 
allow to generate more bits with high HILL entropy than just using the Goldreich-Levin theorem. 
Assuming k bits of unpredictability we get A; — 3 of HILL, whereas GL will only give k — 2 log(l/e). 
But as currently our reduction has a quantitatively larger loss in circuit size than the GL theorem, 
in order to get HILL entropy of the same quality (i.e., secure against (s, J) adversaries for some 
fixed (s, 5)) we must consider the unpredictability entropy of the source X\Z against more powerful 
adversaries than if we’re about to use GL. And in general, the amount of unpredictability (or any 
other computational) entropy of X\Z can decrease as we consider more powerful adversaries. 

2 Entropy Notions 

In this section we formally define the different entropy notions considered in this paper. We denote 
with probabilistic circuits of size s with boolean output, and 

denotes the set of all probabilistic circuits with real-valued output in the range [0,1]. The analogous 
deterministic circuits are denoted and p^®b[o,i] ^ y denote computational 

indistinguishability of variables X and T, formally® 

X F ^ VC G p^'i.10,1} . I Pr[C(X) = 1] - Pr[C(y) = 1]| ^ e (4) 

X F denotes that X and F have statistical distance e, i.e., X ~e,oo Y, and with X ~ F we 
denote that they’re identically distributed. With Un we denote the uniform distribution over {0, !}”■. 

Definition 1. The min-entropy of a random variable X with support X is 

Hoo{X) = — log 2 maxPr[X = x] 

®Let us mention that the choice of the distinguisher class in (4) irrelevant (up to a small additive difference in 
circuit size), we can replace three other distinguisher classes. 
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For a pair (X, Z) of random variables, the average min-entropy of X conditioned on Z is 

Hoc{X\Z) = -log 2 E maxPr[X = x|Z = z] = -loga E 
—Z X z^—Z 

HILL entropy is a computational variant of min-entropy, where X (conditioned on Z) has k bits 
of HILL entropy, if it cannot be distinguished from some Y that (conditioned on Z) has k bits of 
min-entropy, formally 

Definition 2 ( [HILL99], [HLR07a]). A random variable X has HILL entropy k, denoted by 
> k, if there exists a distribution Y satisfying H^{Y) > k and X Y. 

Let (X, Z) be a joint distribution of random variables. Then X has conditional HILL entropy 
k conditioned on Z, denoted by H^^^^{X\Z) > k, if there exists a joint distribution (Y, Z) such that 
Hoc{Y\Z) > k and {X,Z) {Y,Z). 

Barak, Sahaltiel and Wigderson [BSW03] define the notion of metric entropy, which is defined 
like HILL, but the quantifiers are exchanged. That is, instead of asking for a single distribution 
{Y, Z) that fools all distinguishers, we only ask that for every distinguisher D, there exists such a 
distribution. For reasons discussed in Section 2.0.4, in the definition below we make the class of 
distinguishers considered explicit. 

Definition 3 ( [BSW03], [FR12]). Let {X,Z) be a joint distribution of random variables. Then X 
has conditional metric entropy k conditioned on Z (against probabilistic boolean distinguishers), 
denoted by ^ if for every D G exists a joint distribution 

{Y,Z) such that Hoa{Y\Z) > k and 

I Pr[D(X, Z) = l]- Pr[D(y, Z) = 1] | ^ e 


More generally, for class G {rand, det}, range G {[0,1], {0,1}}, 

^Metnc, class,range > k if for every D G ^ Z) eXlsts. 

Like HILL entropy, also unpredictability entropy, which we’ll define next, can be seen as a com¬ 
putational variant of min-entropy. Here we don’t require indistinguishability as for HILL entropy, 
but only that the variable is hard to predict. 

Definition 4 ( [HLR07a]). X has unpredictability entropy k conditioned on Z, denoted by 
H“ff{X\Z) > k, if (X,Z) is (e, s) indistinguishable from some (Y,Z), where no probabilistic circuit 
of size s can predict Y given Z with probability better than 2~^, i.e.. 


H-P(X|y) >k ^ 3{Y,Z),{X,Z) (y,Z) VC,|C| 


^ s : PrJCfz) = y] ^2 


—k 


( 5 ) 


We also define a notion called “list-unpredictability”, denoted ILt)P^{X\Z) > k, which holds if 
HefJ^^XlZ) > k as in (5), but where C additionally gets oracle access to a function Eq(.) which 
outputs 1 on input y and 0 otherwise. So, C can efficiently test if some candidate guess for y is 
correct . ' 

^We name this notion "list-unpredictability" as we get the same notion when instead of giving C oracle access to 
Eq(.), we allow C{z) to output a list of guesses for y, not just one value, and require that Pi(y,z)i-(Y,Z) [v G C(z)] ^ 2“*^. 
This notion is inspired by the well known notion of list-decoding. 
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Remark 1 (The e parameter). The e parameter in the definition above is not really necessary, 
following [HLR07b], we added it so we can have a “smooth" notion, which is easier to compare to 
HILL or smooth min-entropy. If e = 0, we’ll simply omit it, then the definition simplifies to 

R,""P(X|Z) >k ^ Pr [C(^) = x] ^ 2“^ 

(x,z)^{X,Z) 

Let us also mention that unpredictability entropy is only interesting if the conditional part Z is not 
empty as (already for s that is linear in the length of X) we have = Hoo{X) which can 

be seen by considering the circuit C (that gets no input as Z is empty) which simply outputs the 
constant x maximizing Pr[X = x\. 

2.0.4 Metric vs. HILL. 

We will use a lemma which states that deterministic real-valued metric entropy implies the same 
amount of HILL entropy (albeit, with some loss in quality). This lemma has been proven by 
[BSW03] for the unconditional case, i.e., when Z in the lemma below is empty, it has been observed 
by [FR12, CKLRll] that the proof also holds in the conditional case as stated below 

Lemma 1 ( [BSW03, FR12, CKLRll]). For any joint distribution {X,Z) G {0,1}*^ x {O,!}™ and 
any e, 6, k, s 

Note that in Dehnition 2 of HILL entropy, we only consider security against probabilistic boolean 
distinguishers (as was dehned this way), whereas in Dehniton 3 of metric entropy we make the 
class of distinguishers explicit. The reason for this is that in the dehnition of HILL entropy the 
class of distinguishers considered is irrelevant (except for a small additive degradation in circuit size, 
cf. [FR12, Lemma 2.1]).® Unlike for HILL, for metric entropy the choice of the distinguisher class 
does matter. In particular, deterministic boolean metric entropy ^ k is only 

known to imply deterministic real-valued metric entropy k — log(h“^), i.e., 

we must allow for a <5 > 0 loss in distinguishing advantage, and this will at the same time result in 
a loss of log((5“^) in the amount of entropy. For this reason, it is crucial that in Theorem 2 we show 
that unpredictability entropy implies deterministic real-valued metric entropy, so we can then apply 
Lemma 1 to get the same amount of HILL entropy. Dealing with real-valued distinguishers is the 
main source of technical difficulty in the proof of the Theorem 2, proving the analogous statement 
for deterministic boolean distinguishers is much simpler. 

3 Known Results on Provably Secure Key-Derivation 

We say that a cryptographic scheme has security a, if no adversary (from some class of adversaries 
like all polynomial size circuits) can win some security game with advantage ^ a if the scheme is 
instantiated with a uniformly random string.® Below we will distinguish between unpredictability 

®This easily follows from the fact that in the definition (4) of computational indistinguishability the choice of the 
distinguisher class is irrelevant. 

®We’ll call this string “key". Though in many settings (in particular when keys are not simply uniform random 
strings, like in public-key crypto) this string is not used as a key directly, but one rather should think of it as the 
randomness used to sample the actual keys. 
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applications, where the advantage bounds the probability of winning some security game (a typical 
example are digital signature schemes, where the game captures the existential unforgeability under 
chosen message attacks), and indistinguishability applications, where the advantage bounds the 
distinguishing advantage from some ideal object (a typical example is the security definition of 
pseudorandom generators or functions). 

3.1 Key-Derivation from Min-Entropy 

Strong Extractors. Let {X,Z) be a source where HooiX\Z) ^ k, or equivalently, no adversary 
can guess X given Z with probability better than 2“^ (cf. Def. 1). Consider the case where we 
want to derive a key K = h{X,S) that is statistically close to uniform given (Z,S). For example, 
X could be some physical source (like statistics from keystrokes) from which we want to generate 
almost uniform randomness. Here Z models potential side-information the adversary might have 
on X. This setting is very well understood, and such a key can be derived using a strong extractor 
as defined below. 

Definition 5 ( [NZ93], [DORS08]). A function Ext : {0,1}"' x {0, 1}'^ —)• {0, 1}^ is an average-case 
{k, e)-strong extractor if for every distribution (X, Z) over {0, 1}" x {0, 1}™ with Hoc(X\Z) ^ k and 
S ~ Ud, the distribution {Ext{X, S), S, Z) has statistical distance e to {Ui,S,Z). 

Extractors Ext as above exist with i = k — 21og(l/e) [HILL99]. Thus, from any {X,Z) where 
Hoo{X\Z) ^ k we can extract a key K = Ext(X, S') of length k — 21og(l/e) that is e close to 
uniform [HILL99]. The entropy gap 21og(l/e) is optimal by the so called “RT-bound" [RTSOO], 
even if we assume the source is efficiently samplable [DPW14]. 

If instead of using a uniform ^ bit key for an a secure scheme, we use a key that is e close to 
uniform, the scheme will still be at least (5 = a-\- e secure. In order to get security /3 that is of the 
same order as a, we thus must set e ~ a. When the available amount k of min-entropy is small, for 
example when dealing with biometric data [DORS08, BDK'’'05], a loss of 21og(l/e) bits (that’s 160 
bits for a typical security level e = 2“®^) is often unacceptable. 

Condensers. The above bound is basically tight for many indistinguishability applications like 
pseudorandom generators or pseudorandom functions.Fortunately, for many applications a close 
to uniform key is not necessary, and a key \K\ with min-entropy |iL| — A for some small A is basically 
as good as a uniform one. This is the case for all unpredictability applications, which includes OWFs, 
digital-signatures and MACs.^^ It’s not hard to show that if the scheme is a secure with a uniform 
key it remains at least jd = a2^ secure (against the same class of attackers) if instantiated with any 
key K that has \K\ — A bits of min-entropy.^^ Thus, for unpredictability applications we don’t have 

^°For example, consider a pseudorandom function F : {0,1}*’ x {0,1}“ —> {0,1} and a key K that is uniform over 
all keys where F(/S', 0) = 0, this distribution is e ~ 1/2 close to uniform and has min-entropy « \K\ — 1, but the 
security breaks completely as one can distinguish F(?7fe,.) from F{K ,.) with advantage /3 ~ 1/2 (by quering on input 
0, and outputting 1 iff the output is 0). 

[DY13] identify an interesting class of applications called “square-friendly", this class contains all unpredictability 
applications, and some indistinguishability applications like weak PRFs (which are PRFs that can only be queried 
on random inputs). This class of applications remains somewhat secure even for a small entropy gap A: For A = 1 
the security is /3 ~ y/a. This is worse that the j3 = 2a for unpredictability applications, but much better than the 
complete loss of security /3 « 1/2 required for some indistinguishability apps like (standard) PRFs. 

Assume some adversary breaks the scheme, say, forges a signature, with advantage /? if the key comes from the 
distribution K. If we sample a uniform key instead, it will have the same distribution as K conditioned on an event 
that holds with probability 2“^, and thus this adversary will still break the scheme with probability /3/2'^. 
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to extract an almost uniform key, but “condensing" X into a key with \K\ — A bits of min-entropy 
for some small A is enough. 

[DPW14] show that a (loge + l)-wise independent hash function Cond : {0,1}”’ x {0,1^ ^ 
{0,1}^ is a condenser with the following parameters. For any {X,Z) where Hoo{X\Z) ^ i, for 
a random seed S (used to sample a (loge + l)-wise independent hash function), the distribution 
(Cond(A, S), S') is e close to a distribution {Y,S) where Hao{Y\Z) I — loglog(l/e). Using such 
an t bit key (condensed from a source with (. bits min-entropy) for an unpredictability application 
that is a secure (when using a uniform I bit key), we get security j3 ^ -|-e, which setting 

€ = a gives /3 ^ a(l -|- log(l/a)) security, thus, security degrades only by a logarithmic factor. 

3.2 Key-Derivation from Computational Entropy 

The bounds discussed in this section are summarised in Figures 1 and 2 in Appendix A. The last 
row of Figure 2 is the new result proven in this paper. 

HILL Entropy. As already discussed in the introduction, often we want to derive a key from a 
distribution {X,Z) where there’s no “real" min-entropy at all HoaiX\Z) = 0. This is for example 
the case when Z is the transcript (that can be observed by an adversary) of a key-exchange protocol 
like Difhe-Hellman, where the agreed value X = g°‘^ is determined by the transcript Z = 
[Kral0,GKR04]. Another setting where this can be the case is in the context of side-channel attacks, 
where the leakage Z from a device can completely determine its internal state X. 

If X\Z has k bits of HILL entropy, i.e., is computationally indistinguishable from having min- 
entropy k (cf. Def. 2) we can derive keys exactly as described above assuming X\Z had k bits of 
min-entropy. In particular, if X\Z has |iF| -|- 21og(l/e) bits of HILL entropy for some negligible e, 
we can derive a key K that is pseudorandom, and if X\Z has \K\ -|-loglog(l/e) bits of HILL entropy, 
we can derive a key that is almost as good as a uniform one for any unpredictability application. 

Unpredictability Entropy. Clearly, the minimal assumption we must make on a distribution 
(A,Z) G{0,ir X {0,1}™ for any key derivation to be possible at all is that X is hard to compute 
given Z, that is, X\Z must have some unpredictability entropy as in Definition 4. Goldreich and 
Levin [GL89] show how to generate pseudorandom bits from such a source. In particular, the 
Goldreich-Levin theorem implies that if X\Z has at least 21oge“^ bits of list-unpredict ability, then 
the inner product X of X with a random vector ii is e indistinguishable from uniformly random 
(the loss in circuit size is poly(n, m) /e^). Using the chain rule for unpredictability entropy,^^ we can 
generate an i = k — 21oge“^ bit long pseudorandom string that is ie indistinguishable (the extra i 
factor comes from taking the union bound over all bits) from uniform. 

Thus, we can turn k bits of list-unpredictability into k — 21oge“^ bits of pseudorandom bits 
(and thus also that much HILL entropy) with quality roughly e. The question whether it’s possible 
to generate significantly more than k — 21oge“^ of HILL entropy from a source with k bits of (list- 
)unpredictability seems to have never been addressed in the literature before. The reason might 
be that one usually is interested in generating pseudorandom bits (not just HILL entropy), and for 
this, the 21oge“^ entropy loss is inherent. The observation that for many applications high HILL 

^^Which states that if X\Z has k bits of list-unpredictability, then for any (T, R) where R is independent of {X, Z), 
X\{Z, A, R) has k — \ A\ bits of list-unpredictability entropy. In particular, extracting I inner product bits, decreases 
the list-unpredictability by at most 1. 



entropy is basically as good as pseudorandomness is more recent, and recently gained attention by 
its usefulness in the context of leakage-resilient cryptography [DP08,DY13]. 

In this paper we prove that it’s in fact possible to turn almost all list-unpredictability into HILL 
entropy. 

4 Condensing Unpredictability 

Below we state Theorem 1 whose proof is in Appendix B, but first, let us give some intuition. Let 
X\Z have k bits of list-unpredict ability, and assume we start extracting Goldreich-Levin hardcore 
bits Ai,A 2 ,... by taking inner products Ai = RJX for random Ri. The first extracted bits 
Ai,A 2 ,... will be pseudorandom (given the Ri and Z), but with every extracted bit, the list- 
unpredictability can also decrease by one bit. As the GL theorem requires at least 21oge“^ bits of 
list-unpredictability to extract an e secure pseudorandom bit, we must stop after k — 21oge“^ bits. 
In particular, the more we extract, the worse the pseudorandomness of the extracted string becomes. 
Unlike the original GL theorem, in our Theorem 1 we only argue about the unpredictability of the 
extracted string, and unpredictability entropy has the nice property that it can never decrease, i.e., 
predicting Ai,..., Aj+i is always at least as hard as predicting Ai, ..., A^. Thus, despite the fact 
that once i approaches k it becomes easier and easier to predict Ai (given Ai, , Aj_i, Z and the 
i?j’s)^^ this hardness will still add up to /c — 0(1) bits of unpredictability entropy. 

The proof is by contradiction, we assume that Ai,..., Afc can be predicted with advantage 2“^^^ 
(i.e., does not have A: — 3 bits of unpredictability), and then use such a predictor to predict X with 
advantage > 2“^, contradicting the k bit list-unpredictability of X\Z. 

If Ai,..., Afc can be predicted as above, then there must be an index j s.t. Aj can be predicted 
with good probability conditioned on Ai,..., Aj_i being correctly predicted. We then can use the 
Goldreich-Levin theorem, which tells us how to find X given such a predictor. Unfortunately, j can 
be close to k, and to apply the GL theorem, we first need to find the right values for Ai,..., Aj_i 
on which we condition, and also can only use the predictor’s guess for Aj if it was correct on the 
first j — 1 bits. We have no better strategy for this than trying all possible values, and this is the 
reason why the loss in circuit size in Theorem 1 depends on 2*’. 

In our proof, instead of using the Goldreich-Levin theorem, we will actually use a more fine¬ 
grained variant due to Hast which allows to distinguish between errors and erasures (i.e., cases where 
we know that we don’t have any good guess. As outlined above, this will be the case whenever the 
predictor’s guess for the first j — 1 inner products was wrong, and thus we can’t assume anything 
about the jth guess being correct). This will give a much better quantitative bound than what 
seems possible using GL. 

Theorem 1 (Gondensing Upredictability Entropy). Consider any distribution (X, Z) over {0,1}"^ x 
{0,1}™ where 

H*^^^P[X\Z) ^ k 

then for a random R •(— {0, 

H'f'^^{R.X\Z,R) ^k-A 

^^The only thing we know about the last extracted bit Ak is that it cannot be predicted with advantage ^ 0.75, 
more generally, cannot be predicted with advantage 1/2 -|- 1/2-’"'"^. 
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where^^ 


s 


A = 3 


2 ‘ 2 k poly(m, n) 


5 High Unpredictability implies Metric Entropy 


In this section we state our main results, showing that k bits of unpredictability entropy imply 
the same amount of HILL entropy, with a loss exponential in the “entropy gap". The proof is in 
Appendix C. 

Theorem 2 (Unpredictability Entropy Implies HILL Entropy). For any distribution [X,Z) over 
{o,ir X {0,1}”*, if X\Z has unpredictability entropy 


^ k (6) 

then, with A = n — k denoting the entropy gap, X\Z has (real valued, deterministic) metric entropy 


^Metric,det,[0,l](^|^) ^ ^ H ( s 


25A ^Qg2 

By Lemma 1 this further implies that X\Z has, for any (5 > 0, HILL entropy 


(7) 


H 


+5+7,O052/(n+m))(^l-^) ^ ^ 


which for e = 6 = j is 


H. 


^e'sl(s-e7/25A(„+m)log2(2A,.-l))(-^|^) ^ k 
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A Figures 


Deriving a (pseudo)random key of length \K\ = k — 2loge ^ 
from a source (X, Z) G {0,1}^ x {0,1}^ where X\Z has k bits (min/HILL/list-unpredictability) entropy 

Entropy 

type 

Entropy quantity and 
quality of source 

Derive key K of 
length k — 21oge^^ as 

Quality of derived key 
Hf:'f}-{K\Z,S) = k- 21oge-i = \K\ 
equivalently 
{K,Z,S) 

min 

H„^{X\Z) = k 

K = Ext(X, S) 

e' = e s' = oo 

HILL 

H^)f^{X\Z) = k 

K = Ext(X, S) 

e' = e-k S s' ^ s 

Unpredict. 


K = GL{X,S) =S‘^'X 

e' = me + <5 s' = s ■ e''‘/poly{m, n) 


Figure 1: Bounds on deriving a (pseudo)random key K of length \K\ = k — 21oge“^ bit from 
a source X\Z with k bits of min, HILL or list-unpredictability entropy. Ext is a strong extractor 
(e.g. leftover hashing), and GL denotes the Goldreich-Levin construction, which for X G {0,1}” and 
S G {0, is simply defined as GL(X, S') = S^X. Leftover hashing requires a seed of length 

\S\ = 2n (extractors with a much shorter seed IS"! = 0(logn-|-log e“^) that extract k—2 log 0(1) 
bits also exist), whereas Goldreich-Levin requires a longer |S| = \K\n bit seed. The above bound 
for HILL entropy even holds ii X\Z only has k bits of probabilistic boolean metric entropy (a notion 
implying the same amount of HILL entropy, albeit with a loss in circuit size), as shown in Theorem 
2.5 of [FR12] 


Deriving k bit key K with high HILL entropy from X\Z with k bits (min/HILL/list-unpredictability) entropy 

Entropy 

type 

Entropy quantity and 
quality of soucre 

Derive key of 
length \K\ = k as 

Quantity and quality of HILL entropy of K 

H»'fy{K\Z,S) ^k-A 

min 

H^{X\Z) = k 

K = Cond{X,S) 

e' = e s' = oo A = log log 

HILL 

= k 

K = Cond(X,S) 

e'= e + 5 s'^ s A = log log 

Unpredict. 


K = GL{X,S) = S'^’X 

e' = e-\-S s'= s ■ e' /2'‘"^poly{m,n) A = 3 


Figure 2: Bounds on deriving a key of length k with min (or HILL) entropy k — X from a source X\Z 
with k bits of min, HILL or unpredictability entropy. Cond denotes a (loge -|- 1) wise independent 
hash function, which is shown to be a good condenser (as stated in the table) for min-entropy 
in [DPW14]. The bounds for HILL entropy follow directly from the bound for min-entropy. The 
last row follows from the results in this paper as stated in Proposition 1. 


B Proof of Theorem 1 

We will use the following theorem due Hast [Has03] on decoding Hadamard code with errors and 
erasures. 

Theorem 3 ( [Has03]). There is an algorithm LD that, on input I and n and with oracle access 
to a binary Hadamard code of x (where |x| = n) with an e-fraction of errors and an s-fraction of 
erasures, can output a list of 2^ elements in time 0{nl2^) asking n2* oracle queries such that the 
probability that x is contained in the list is at least 0.8 if I ^ log 2 ( 20 n(e -|- c)/(c — e)^ -|- 1), where 
c = 1 — s — e (the fraction of the correct answers from the oracle). 
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We’ll often consider sequences vi,V 2 , ■ ■ ■ oi values and will use the notation to denote (va, ■ ■ ■, v^), 
with = 0 if a > b. is short for v\ = {vi,... ,Vb)- 

of Theorem 1. It’s sufficient to prove the theorem for e = 0, the general case e ^ 0 then follows 
directly by the dehnition of unpredictability entropy. To prove the theorem we’ll prove its contra¬ 
position 

H^’'^{R.X\Z, R)<k-A => < k (8) 

The left-hand side of (8) means there exists a circuit A of size |A| ^ t such that 

Pr [A( 2 :,r) = r.x] ^ (9) 

It will be convenient to assume that A initially flips a coin b, and if 6 = 0 outputs a uniformly 
random guess. This loses at most a factor 2 in A’s advantage, i.e., 

Pr [A(z,r) = r.x] ^ (10) 

(X,Z),r'4—{0,1}^^^ 

but now we can assume that for any z, r and w G {0,1}^ 

Pr[A( 2 ;,r) = w] ^ 2 ~^~^ (11) 

Using Markov eq.(lO) gives us 

Pr [ Pr [A(z,r) = r.x] ^ 2-^+^-^] ^ 2-'^+^-^ (12) 

We call {x,z) G supp[(W, Z)] “good" if 

{x,z) is good Pr [lK{z,r) = r.x] ^ 2~^~^^~‘^ (13) 

Note that by eq.(12), (z,x) ^ (^,^) is good with probability ^ 2“^+^“^. 

We will use A to construct a new circuit B of size s = 0(t2^^ poly(n)) where 

Pr [B(z) = X \ {x,z) is good] > 1/2 (14) 

(x,z)-«-(X,Z) 


Which with (14) and (12) further gives 

Pr [B (z) = x] = Pr[B( 2 ;) = x\{x,z) is good] • Pr[(x,z) is good] 

(x,z)i^(X,Z) 

> 2~^ ■ 2-^+^-2 ^ 2-^+^-3 (15) 

contradicting the right-hand side of (8), and thus proving the theorem. 

We’ll now construct B satisfying (14), for this, consider any good {x,z). Let R = R^ = 
{Ri ,..., Rk) be uniformly random and let A = = (Ai,..., Ak) where Aj = Ri.x. 

Let A •(— A{z,R) and dehne e* = Prij[Ai = AjjA*“^ = A*“^]. Using (13) in the last step 

k 

ff e* = Pr[A = A] = Pr[A(z, R) = R.x] ^ 2-^+^-2 

R R 

2=1 
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Thus, here exists an z s.t., e* ^ 2 = 1 -|- <5 with 6 ~ We hx this i (we don’t know 

which i is good, and later will simply try all of them). Then 


Ek,-i[ Pr [Ai = A, \A^-^ = A^-^]]^ 1/2+ 5 

RuRUi 

Using Markov 

Pr [ Pr [ii = Ai I ^ 1/2 + h/2] ^ ^ (16) 

We call good if (note that by the previous equation a random is good with probability 

^ <5/2). 

p-i is good ^ Pr [ii = Ai I ^ 1/2 + S/2 (17) 

Ri’R^i 


From now on, we fix some good and assume we know (later we’ll simply try all 

possible choices for a*“^). 

We define a predictor Pi(ri) that tries to predict rj.x given a random (and also knows 
as above) as follows 

1. Sample random <— 

2. Invoke A^ ^ A{z,r^^\x). Note that = (r*“^, r*, consists of the fixed r*“^, the input 
Tj and the randomly sampled 

3. if A^~^ = a®“^ output Ai, otherwise output T. 

Using (11), which implies Pr[^*“^ = a*“^] ^ 2“*, and (17) we can lower bound Pj’s rate and 
advantage as 


Pr[P,(7?,)^T] 

ixi 

Pr[Pi(i?i) = Ri.x] 

Ri 


= Pr[i*-^ = a*“^] ^ 2 -\ 

^ Pr[i*-i = a'-i](^ + <5/2). 


(18) 


In terms of Theorem 3, we have a binary Hadamard code with e + c = Pr[ 74 *“^ = a*~^], 
c — e = 6 ■ Pr[ 74 *“^ = a*“^], which implies that (e + c)/(c — e)^ ^ 

Now Theorem 3 implies that given such a predictor P we can output a list that contains x with 
probability > 0.8 in time 0(2* poly(m, n)) = 0(2^ poly(m, n)), as we assume access to an oracle Eq 
with outputs 1 on input x and 0 otherwise, we can find x in this list with the same probability. 

Using this, we can now construct an algorithm as claimed in (14) as follows: B will sample 
i G {1,...,A;} and then at random. Then B calls Pj with all possible G {0,1}*“^. We 
note that with probability S/2k (we lose a factor k for the guess of i, and S/2 is the probability of 
sampling a good r*“^) the predictor P* will satisfy (18). 

If X is not found, B repeats the above process, but stops if x is not found after 2k/S iterations. 
The success probability of B is ~ (1 — l/e)0.8 > 0.5 as claimed, the overall running time we get is 
0 ( 2 ^^ poly(m, n)). □ □ 
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C Proof of Theorem 2 


It’s sufficient to prove the theorem for 7 = 0 , the case 7 > 0 then follows directly by dehnition 
of unpredictability entropy. Suppose for the sake of contradiction that (7) does not hold. That is, 
^M^etric,(iei,[o,i] ^ which means that there exists a distinguisher D : {0,1}"' x {0,1}™ —>■ [0,1] 

of size t that satishes 


ED{X,Z)-ED{Y,Z) ^ e Z) : Hoc{Y\Z) ^ k. (19) 

We will show how to construct an efficient algorithm that given Z uses D to predict X with proba¬ 
bility at least 2“^, contradicting ( 6 ). The core of the algorithm is the procedure Predictor described 
below. 


Function Predictor(z, D',£) 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 


Input : z ^ Z, [0, 2]-valued distinguisher D' 
Output: X G {0,1}"^ 
b i — 1 , i i — 1 

while 6/0 and i < £ do 

x^{0,ir 

b ^ BernoulliDistribution(D'(x, z)l2) 
if 6 = 0 then 

i i + l 

else 

return x 
end 
end 

return T 


/* outputs 1 w.p. D'{x,z)/2 */ 


Predictor(Z, D, samples an element x G {0,1}” according to some probability distribution. 
This distribution captures the following intuition: as the advantage ED(X, Z) — ED(y, Z) is positive 
(as assumed in (19)), we know that x being the correct guess for X is positively correlated with the 
value D{x,Z). The probability that Predictor(Z, D,f) returns some particular value x as guess for 
X will be linear in D(x, Z). 

Predictor(Z, D,.^) may also output T, which means it failed to sample an x according to this 
distribution. The probability of outputting T goes exponentially fast to 0 as ^ grows. 

A toy example: predicting X when Z is empty and D is boolean. Suppose that ED(A) — 
ED(y) / e for all Y such that H^xiiY) / k. And assume that D(.) is boolean (not real valued as 
in our theorem). Then Predictor(0, □,£) will output a guess for X that (if it’s not T) is a random 
value X satisfying D(a;) = 1. The probability that this guess for X is correct equals ED(A)/|Z)| 
where \D\ = ^2^ D(x). Consider now the distribution Y of min-entropy k that maximizes ED(y). 
We can assume that Y is flat and supported on those 2*’ elements x for which the value D(x) is 
the biggest possible. Observe that since ED(A) — ED(y) > 0, we have ED(y) < 1 and since D 
is boolean, the support of Y contains all the elements x satisfying D(x) = 1. Therefore we obtain 
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ED(y) = 2 ^\D\. Now we can estimate the predicting probability from below as follows: 

Fi[X is predicted correctly] = ^ ^ ^ ^ JW] 

The above probability holds for i = oo, i.e., when predictor never outputs T. For efficiency reasons, 
we must use a hnite, and not too big i. The predictor will output T with probability (1 — 2“”|T)|)^ 
and thus 


Pr[we predcit X in time 0{i ■ time(Z)))] 



(l - {1 - 2-^\D\Y^ 


With a little bit of effort one can prove that setting £ = 1 + 2” ^/e ~ 2'^/e yields the success 
probability 2~^ independently of \D\. 


Proof in general case - important issues Unfortunately, what we have proven above cannot 
be generalized easily to the case considered in the theorem, there are two obstacles. First, in 
the theorem we consider a conditional distribution X\Z (i.e., the conditional part Z is not empty 
as above). Unfortunately we cannot simply make the above argument separately for all possible 
choices Z = z oi the conditional part, as we cannot guarantee that the conditional advantages e(z) = 
ED(X|Z = z,z) — ED(y|Z = z, z) are all positive; we only know that their average e = E^.(_^e(z) 
is positive. Second, so far we assumed that D is boolean. This would only prove the theorem where 
the derived entropy in (7) is against deterministic boolean distinguishers, and this is not enough to 
conclude that we have the same amount of HILL entropy as discussed in Section 2.0.4. 


Actual proof - preliminaries For real-valued distinguishers in the conditional case, just invoking 
Predictor(Z, D,£) on a D satisfying (19), will not give a predictor for X with advantage > 2~^ 
in general. Instead, we hrst have to transform D into a new distingusiher D' that has the same 
distinguishing advantage, and for which we can prove that the predictor will work. 

The way in which we modify D depends on the distribution Y\Z that minimizes the left-hand 
side of (19). This distribution can be characterized as follows: 

Lemma 2 ( [Skol5]). Given D : {0,1}"’ x {0,1}™' —>■ [0,1] consider the following optimization 
problem 

max ED(y, Z) 

Y\z ^ (20) 

s.t. H^{Y\Z) ^ k 

The distribution Y\Z = Y*\Z satisfying Hoo(Y*\Z) = k is optimal for (20) if and only if there exist 
real numbers t{z) and a number A ^ 0 such that for every z 

(a) ^,^max(D(x,z)-t( 2 :), 0 ) = A 

(b) If0< Yy*\z=z{x) < maxj,/Py.| 2 = 2 (x') then D{x,z) = t{z). 

(c) If 'Py*\z=z{x) = 0 then D(x,z) ^ t{z) 

(d) If PY*\z=zi^) = Py*\z=z{x') D{x,z) ^ t{z) 
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Proof. The proof is a straightforward application of the Kuhn-Tucker conditions given in Appendix. 

□ □ 

Remark 2. The characterization can be illustrated in an easy and elegant way. First, it says that 


the area under the graph of U{x,z) and above 
(see Figure 3). 



Figure 3: For every z, the (green) 


threshold t{z) is the same, no matter what z is 



under D(-,z) and above t{z) equals A 


Second, for every z the distribution Y*\Z = z is flat over the set {x : D(x, z) > t{z)'\ and vanishes 
for X satisfying D{x,z) < t{z), see Fig. 4- 



ft 


D{x,z) 

t(z) 

5 D{x,z) > t{z) 
] D(a:, z) < t{z) 


Figure 4: Relation between distinguisher D(x,z), threshold t{z) and distribution Y*\Z = z. 


Note that because of “freedom" in dehning the distribution on elements x satisfying D{x,z) = t{z) 
(2, point (b)), there could be many distributions Y*\Z corresponding to hxed numbers A and t{z) 
that satisfy the characterization above, and this way are optimal to (20) with k = F[cx){Y*\Z). For 
the sake of completeness we characterize bellow the all possible values of k that match to A and 
t{z). We note that this fact might be used to modify our nonuniform guessing algorithm into a 
uniform one. 
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Corollary 1. Let D : {0,1}” x {0,1}™' —)> [0,1] and A G (0,1). Let t{z) = t{X,z) be the unique 
numbers that satisfy the condition (a) in Lemma 2. Define 

k{X) = n - log [1/P(D([/, z) i(z))]), (21) 

which is a non-decreasing right continuous function of X. Let k~{X) = lim;^/_ 5 .;^- k{X') and k~^{X) = 
k{X') = /c(A) be the one-sided limits. Then for every Y*\Z of min-entropy k = Hoo{Y*\Z) 
fulfilling (b),(c) and (d) we have k~ ^ k ^ k~^. Conversely, if k satisfies k~ ^ k ^ k~^ then there 
exists a distribution Y*\Z fulfilling (b),(c) and (d) such that Hoa(Y*\Z) = k. 


Predicting given the thresholds t{z). We use the numbers t{z) to modify D and then we 
call the procedure Predictor on the modified distinguisher. Lemma 3 below shows that we could 
efficiently predict X from Z, assuming we knew the numbers t{z) for all z in the support of Z (later, 
we’ll show how to efficiently approximate them) 

Lemma 3. Let Y*\Z be the distribution satisfying Hcx){Y*\Z) = k and maximizing ED(y, Z) over 
Hoo(Y\Z) k, where k < n and D satisfies (19). Let t{z) be as in Lemma 2. Define 


L)'{x, z) = max(D(x, z) — t{z),0) 


( 22 ) 


and set £ = 2 • 2” ^ in the algorithm PREDICTOR. Then we have 


Pr (Predictor(Z, D', i) = X) 2 ^ (^1 + 2*’ ”e 

Proof. We start by calculating the probability on the left-hand side of (23) 

Claim 1. For any^^ D', the algorithm Predictor outputs X given Z = z with probability 

/ED'(t/,z) 


(23) 


Pr (Predictor(Z, D',(.) = X\Z = z) =2 


x,z 


l ED\U,z) ^ •ED'(X|Z = z,z) (24) 


where U is uniform over {0,1}” and g is defined by g{d) = ^ (so g{d) ~ 1/d for large i) 


of Claim. It is easy to observe that 

Pr[PREDICTOR(z, D',£) 


Predictor(z, D/i) / E] = 

X 


(25) 


In turn, for every round i = !,...,£ of the execution, the probability that PREDICTOR stops 
and outputs x' is equal to Pr[17 = x']D'{x', z)/2 = 2“”“^D'(x', z), the probability that it outputs 
anything (and thus leaves the while loop) is thus Pr[f7 = x'] ■ ^1 — j _ ED {U,z) ^ 

the probability of not leaving the while loop for £ rounds (in this case the output is E) is 

Pr[PREDICTOR(z, D',£) = E] = 1 - ^1 - (26) 

^®We will only use the claim for the distinguisher D' as constructed above, but the claim holds in general. 
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Combining the last two formulas we obtain 


Pr[PREDlCTOR( 2 ;, D') = x] = 2 "■ ^g{ED'{U, z)l2) ■ D'(x, z) (27) 


Hence 


Pr[PREDICTOR( 2 ;, D') = X\Z = z] = '^ Pr[PREDICTOR(z, D') = x,X = x\Z = z] 

X 

= ^ Pr[PREDlCTOR(z, D') = x] Pr[X = x\Z = z] 

X 

= 2-’"-^5(ED'([/, z)/2) D'(x, z) Pr[X = x\Z = z] 

X 

= 2-'^-^g{¥.D'{U, z)/2)¥.D'{X\Z = z, z) (28) 

and the claim follows. □ □ 

Now we can see why we cannot apply the algorithm Predictor using the distinguisher D 
satisfying only (19) directly. According to the last formula, the success probability would be an 
averaged sum of products ( 7 (ED(t/, z)) ■ ED(X|Z = z, z) over z. We know the average of the second 
factors of these products, but in general cannot compare the values of ED(?7, z) for different z’s. 
The crucial observation is that the distinguisher D' we defined satisfies the same inequality (19) as 
D (though, D' has the range [0,2] not [0,1] as D). Moreover D' has a special form which allows us 
to simplify expression (23). The details are given in the next two claims 

Claim 2. We have ED'(A, Z) - ED'(y, Z) ^ e for all Y\Z : HooiY\Z) ^ k 

of Claim. We argue thatja): ED' (A, A) -ED'(W,Z) ^ ED(A,A) -ED(y*,Z) and (b): Y*\Z 
maximizes D'(y, Z) over Hoo(Y\Z) ^ k. For the proof of (a), observe that by (22) we have D'(x, z) ^ 
D(x, z)— t(z) for every X and z. Hence ED' (A, Z) ^ ED(A, Z) — 1 ( 2 :). Moreover, if D(x, z)— t( 2 ;) <0 
then Lemma 2 implies Py*|z=z(^) = 0 and thus ED'(y*|Z = z, z) = ED(y*|Z = z) — t(z). Hence, 
for all z we have 

ED'(A|Z = z)- ED'(y*|Z = z,z) ^ ED(A|Z = z,z)- ED(y*|Z = z, z) 

The proof of (a) follows now by taking the average over z. The proof of (b) follows by observing 
that D' satisfies the characterization in (2) with t{z) = 0 for all z. □ □ 

Claim 3. The exists a number A' G (0,1) sueh that ED'([/, z) = A' for every z. 

Proof. Lemma 2 implies D'(x, z) = A for every z. We can define A' = 2“"'A and then it remains 
to show A < 2” and A > 0. Observe that the case t{z) < 0 in Lemma 2 is possible if and only if 
Py.|^^^(x) = maxj,/ Py*|^= 2 (x') for all x, which means Hoo{Y*\Z = z) = n. Since k < n, we have 
t{z) ^ 0 for at least one z and then A = ^^max(D(x,z) — t(z),0) ^ D(x,z) which essentially 
means A ^ 2”. Lemma 2 guarantees that A ^ 0 , therefore we need to show that A 0 {0,2”}. 
Observe that if A = 0 then the condition Ylx D'(x,z) = A implies D'(x,z) = 0 for all x and z, 
contradicting to Claim 2 because e > 0. In turn, if A = 2” then from Lemma 2 we get D(-, z) = 1 
and t{z) = 0 for all z such that t{z) ^ 0. This is possible only if Py*|^= 2 (x) = max^,/ Py*| 2 =^(x') 
for all X which means L7oo(y*|Z = z) = n if t{z) ^ 0. But then L7oo(y*|Z = z) = n for all z which 
contradicts k < n. □ □ 
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To calculate the success probability we need one more observation. The following claim shows that 
support of D' is contained in the support of Y*. 

Claim 4. For every z we have 

ED'{Y*\Z = z,z) = ED'{U,z)-2'" maxPY*\z=z{x'). (29) 

x' 

of Claim. By Lemma 2, D(x, z) > t{z) only if PY*\z=zix) = maxa-/ Py*\z=z{x') therefore 
ED'{Y*\Z = z,z) = max(D(a:, z) — t{z),0)PY*\z=z{x) 

X 

= ^ max(D(a:, z) - ^(z), 0) max ^Y*\Z=z{x')-, 

X 

and the claim follows by the dehnition of D'. □ □ 

Now we are ready to prove the main result. From Claim 1 and Claim 3 we obtain 

Pr (Predictor(Z, D',£) = X) = [ 5 (A 72 ) • D'{X\Z = z,z)] 

= 2-'^-^g{X'/2) ■ ED'(X, Z) (30) 


Claim 2 applied to T = Y* yields now the following estimate 

Pr (Predictor(Z, D',£) = X)^ 2-''-^g{X'/2) • (ED'(y7 Z) + e) . (31) 

Observe that Claim 4, Claim 3, and Hao(Y*\Z) = k imply 


ED'{Y*,Z) = E,^z [D'{Y*\Z = z, z)] = E,^z 
= 2”A'-E^^z maxPy.|^=^(x') 

O'/ ' 


ED'{U,z) • 2” max Py*12=2(3^0 

o>/ ' 


= 2”"*^ A' 


(32) 


Plugging this into (31) we get the following bound 

Pr (Predictor(Z, D',1) = X)^ 2“"-“^5r(A72) • (^2"'“^A' + 

= 2 -^= (1 - (1 - A72)^) (^1 + (33) 

To give a lower bound on the success probability it remains to minimize the last expression over 
X' G (0,1). This is answered below 

Claim 5. Let h{s) = (1 —(1 —s)^)(l + as“^), where a > 0 and i ^ l + a“^. Then h{s) ^ h{l) = 1 + a 
for all s G [0,1]. 

of Claim. The proof uses standard calculus and is given in the appendix. □ □ 
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Computing t{z) from A So far, we have shown how to construct the predicting algorithm pro¬ 
vided that we are given the numbers t{z). Now we will prove that one can compute them approxi¬ 
mately and use successfully in place of the original ones. We start with a few useful facts about the 
auxiliary function g already introduced in Claim 1 in the proof of Lemma 3. Below we summarize 
its fundamental properties. 

Lemma 4. For i > 1 the function g{d) = on [0,1] satisfies: 

(a) g is continuous at 0 and decreasing 

(b) g is convex 

(c) for any d 2 > di we have g{d 2 ) > g{di) (l — | • 1^2 — c?i|) 

of Lemma. The proof uses elementary calculus and is referred to the appendix □ □ 

The entire solution is based on the next two lemmas. The first lemma is based on the intuition 
that replacing D by a distinguisher which approximates it close enough should not affect the success 
probability of Predictor(Z, D, f) very much. For technical reasons we present this statement 
assuming one-sided -approximation. The second lemma describes an efficient algorithm which 
obtains A as a hint on its input and computes approximations for t{z) from below, for every z. 

Lemma 5. Let Di, D 2 : {0,1}” x {0,1}”* —>■ [0,1] be any two functions satisfying 

(a) D 2 {x, z) ^ Di{x, z) for all X, z 

(b) KD 2 {U, z) — EDi(17, z) ^ 5 for all z 
Then we have 

Pr (Predictor(Z, D 2 ,£) = X) ^ {1- M/2) Pr (Predictor(Z, Di,l) = X) 
of Lemma. We have 

Pr (Predictor( 2 ;, D 2 , t) = X\Z = z) = g'(ED 2 (C/, z))'ED 2 {X\Z = z, z) 

^g(ED2(17,z))EDi(X|Z = z,z), 

where the inequality follows from D 2 ^ Di ^ 0. The assumptions (a) and (b) imply |EDi(17, z) 

6 for every z. From property (c) in Lemma 4 it follows that 

5 (ED 2 ([/, z)) ^ g{EDi{U, z))(l - M/2) 

for every z. Combining the last two estimates we get 

Pr(PREDlCTOR(z,D 2 ,^) = X\Z = z) ^ (1 - M/2) • ff(EDi(U, z))EDi (X| Z = z, z) 

= (1 - M/2) • Pr (Predictor(z, Di) = X\Z = z) 

Taking the average over z Z completes the proof. □ 


(34) 


(35) 

-ED2(17,z)| ^ 


(36) 

□ 
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Lemma 6. Let D : {0,1}" — )• [0,1] be any function computable in time s, let A G (0,1) and 
t G [0,1] be a number such that Emax(D(t/) — t, 0) = A. There exists a probabilistic algorithm 
FindThreshold(D, a, 5^ N) that runs in time O {log{l/6)N ■ time(D)) and with probability at least 
1 — 2 log(12/5)e“'^'^^/^ outputs a number t' such that Kmax{D(U)—t',0) G [A, A+ <5]. In particular, 
t' ^ t. 

of Lemma. The idea is pretty simple: given t' we approximate values E max(D(t/)—0) by sampling 
and by comparing the result with A, we can find the right value of t' using binary search. This 
corresponds to finding a blue line on Fig. 4 such that the green area above is sufficiently close to A. 


Function FindThreshold(D, A, 5, A^) 


Input : D : {0,1}"" —)■ [0,1], A G (0,1), parameters <5, N 
Output: t' such that Emax(D(f7) — t' , 0) G [A, A + <5] 

1 t i — —1, t~^ i — 1 

2 repeat 

3 


t' ^ {t- + t+)/2 

Xi, . . . , Xat ■(— [/ 
y . AT-l 


if A' > A + ^ then 


t+ 


t' 

' < 
t' 


t 


else if A' < A + I then 


4 

5 

6 

7 

8 
9 

10 

11 

12 


13 until t~^ — t ^ ^ 

14 if < — 1 + ^ then 

15 ' ' 


N ^ X]7=i — t', 0) 


/* fresh values every time */ 
/* A'PS Emax(D([/) — tj, 0) */ 


else 


return t' 


end 


f ^ -1 


16 return t' 


The function h{T) = Emax(D([/) — T,0) is clearly non-increasing with respect to t' and changes 
from l-|-ED(t/) at t' = —1 to 0 for t = 1. Moreover, it is strictly decreasing in a small neighborhood 
oi t' = t and for all t' < t. Indeed, since A > 0 there is at least one x such that D(x) > t. Taking 
t' < t" ^ min2,.Q(3,)>^ D(r) we see that h{t') — h{t") ^ 2~'^{t" — t') > 0. Hence, t' > t implies 
Emax(D(17) — t',0) < Emax(D(17) — t,0) = A. This proves the second part of the statement. 
Denote by X'-,t^,t~,tf the values assigned in round i to X',t',t~,t~^ respectively. Observe that by 
the Chernoff Bound^^ and the union bound over at most log(12/(5) rounds of the execution, with 
probability p = 1 — 21og(12/h) exp(—iV(5^/3) we have |A^ — h{ti)\ < ^ for every round i. Note that 
with the same probability the algorithm satisfies the invariant property: if there is tg £ ^'^t] 

such that /i(to) G [-^ + ff j -^ + §] the algorithm jumps to round i + 1 then tg £ • 

Suppose that /i(tg) G [^ + f|) -^ + yf] for some tg G [—1,1]. Now we have two possibilities: either 

use the following version: let Xi,... ,Xn be [0, l]-valued independent random variables, let X = Hi 

and fi = EX. Then Pr (|X — yt| > Sfi) < 2 exp(—yt(5^/3) 
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we terminate with tj such that A* G + i’ + t] which means h{ti) G [-^ + -^ + ff] we 

are done, or we will eventually find such to up to an error Since \h{t 2 ) — h{ti)\ ^ \t 2 —ti\ for any 
ti,t 2 , the returned number t' satisfies h{to) — ^ ^ ^ ^(fo) + hr particular it satisfies the 

desired inequality. It remains to consider the case when either h{t) < A + for all t or h{t) > A + y|. 
Since h{l) = 0 the second is clearly impossible. In the first case we have h{t) ^ h(—1) < A + 
which means that in every round i we have = — 1 and either we terminate with ti such that 
A^ G [A + |, A + ^] which means h{ti) G [-^ + f|) -^ + yf] und we are done, or in every round i we 
do the assignment = ti which yields ti = —1 + and the main loop halts with ti < —1 + 

The algorithm outputs then —1 which satisfies the desired inequality, because of the assumption 
/i(—1) < A + II and the trivial inequality h{—l) ^ 1 ^ A. □ □ 

Let D'be as in Lemma 3. Let = FindThreshold(D, A, (5, A^), define D"(x, z) = max(D(17, z) — 
t'{z),0). Denote by PT[bad] the probability that KD"{U, z) 0 [A, A + (5] (i.e. probability of failure 
of the algorithm FindThreshold). If the event bad doesn’t occur then D" ^ D' and ED"(C/, z) ^ 
ED'(C/, z) + 6 . Applying the last two claims we obtain 

Pr [Predictor(z, D",£)] ^ 2“^ (l + 2*’“”e^ • ^1 - Pr[-i 6 ad] (37) 

By the elementary inequality (1 + s)(l — s/4)^ ^ 1 valid for s G [0,1], for this probability to be 
bigger than 2 ~^ it is enough to require 

£6/2 ^ 2^-^e/4 (38) 

2log{12/6)exp{-N6^)/3) ^ 2^-^e/4 (39) 

The solution for the first inequality is <5 = 0(2^*^^“"'^e^) which implies <5 <C e. The second one gives us 
N = £} ((l/(5)^(loglog(l/(^) + n — k + log(l/e)) which can be simplified to = D ((l/5)^(log(l/(5)). 
The total running time is (up to a constant factor) the time needed for invoking O {£ ■ A^log(l/<5)) = 
O ((2^/e)^log^ (2^/e)) times of the distinguisher D . 

D Proof of Lemma 2 

Proof. Consider the following linear optimization program 


maximize 

^ D{x,z)P{x,z) 


^ X,Z i'-^Z 

X^Z 


subject to 

-Px,z ^ 0, 

{x,z) G {0,ir X {0,1}™ 


Px,z — F z{z) = 0, 

z G {0,1}™ 


Px^z ^Z ^ 0, 

z G {0,1}™ 


- 2“^ ^ 0 



2 : 


This problem is equivalent to (20) if we define Fy,z{x, z) = P{x,z) and replace the condition 
maxx Py,z(a^; • 2 ) ^ 2 “^, which is equivalent to Hoo{Y\Z) ^ k, by the existence of numbers 
^ maxa; Fy,z{x, z) such that ^ The solutions of (40) can be characterized as follows: 


23 


Claim 6. The numbers {Px,z)x,z 7 {o-z)z ^*"6 optimal for ( 4 O) if and only if there exist numbers 
^ 0, A^(z) G M, X^{x,z) ^ 0, ^ 0 such that 

(a) D(x, z) = —A^(x, z) + X'^{z) + X^{x, z) and 0 = — X^{x, z) + A^ 

(b) We have X^{x,z) = 0 if Px,z > 0, X^{x,z) = 0 if Px,z < o,z, A^ = 0 if'^^ciz < 2~^. 

of Claim. This is a straightforward application of KKT conditions. □ □ 

It remains to apply and simplify the last characterization. Let {Pf ^)x^z, (02)2 be optimal for (40), 
where P*{x, z) = Py*^2(x, z), and A^(x, z), X‘^{z), X^{x, z), A‘^(x) be corresponding mnltipliers given 
by the last claim. Define t{z) = X‘^{z) and A = A^. Observe that for every z we have a* ^ 
maxP(x, z) ^ 2 ~'^Pziz) > 0 and thns for every (x, z) we have 

X 

X^{x, z) ■ X^{x, z) = 0 (41) 

If P*{x,z) = 0 then P*{x,z) < a*{z) and X^{x,z) = 0, hence D{x,z) ^ t{z) which proves (c). If 
P*{x,z) = maxa;/P*(x, z) then P*{x,z) < 0 and A^(x,2;) = 0 which proves (d). Finally observe 
that (41) implies 


max(D(x, z) — t{z), 0) = max(—A^(x, z) + A^(x, z), 0) = A^(x, z) 

Hence, the assnmption J2x ^^{^7 = X^ = X proves (a). 

Snppose now that the characterization given in the Lemma is satisfied. Define P*{x,z) = 
Py,z{x, z) and = max^ Py*,z{x7 ^), let A^(x, z) = max(D(x, z) — t(z),0), A^(x, z) = max{t{z) — 
D(x,2:),0) and A^ = A. We will show that these nnmbers satisfy the conditions described in the 
last claim. By definition we have —A^(x,2;) + X‘^{z) + A^(x,2;) = D(x,z), by the assnmptions we 
get Ylx = A = A^. This proves part (a). Now we verify the conditions in (b). Note 

that D{x,z) < t{z) is possible only if PY*\z=zix) = 0 and D(x,2;) > t{z) is possible only if 
Py.|^=^(x) = m.a,-Y.xi Py*\z=z{x'). Therefore, if Py^2(x,2) > 0 then we mnst have D(x,2) ^ t{z) 
which means that A^(x,2:) = 0. Similarly if Py,z{x,z) < max^ Py*^^(x, z) then D(x,2;) ^ t{z) and 
X^{x,z) = 0. Finally, since we assnme Hoo{y*\Z) = k we have 'Yfjz^z = 2“^ and thns there is no 
additional restrictions on A^. □ □ 


E Proof of Corollaryl 


of Corollary. Let ymax(^) = max^,/ Py\z=z{x')- Consider the fnnction 


fz{x) 


ymax(^) + d, 

l-#{x: D'(x.z)>t(z)}-(y 

max +^) 


#{x: D'(x,z)=t(z)} 


0 , 


D'(x, z) > t(z) 
D'(x, z) = t(z) 
D'(x, z) < t(z) 


This fnnction defines a distribntion that satisfies 


fz{x) ^ max/f(x') Vx : D'(x, 2;) ^ t{z) 

x' 


(42) 


(43) 
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if and only if 6 satisfies 


1 


1 


#{x: D'(x.z) ^ t(z)} 


^ ymax(z) + 5 ^ 


#{x: D'(x.z) > t(z)} 


(44) 


In particular these conditions are satisfied for <5 = 0. Suppose now that there are Zi and Xj for 
i = 1,2 such that 0 < Py*|z=^.(a;,) < maxPY*iz=z(x'). Dehne S by 


1 


1 


mm ^ymax(2i) ^ : D'(x, zi) ^ t(zi)} ’ # {x : D'(x, Z 2 ) > t(z 2 )} 


ymax(z2) 


By Lemma 2 we immediately obtain that 5^0. It follows easily from the dehnition of S that 
the number —6 satishes (44) with z = and that 5 satisfies (44) for z = Z 2 - We can see now 


that if we replace the distribution Y*\Z = zi by and the distribution Y*\Z = Z 2 by then 
we obtain the distribution Y'\Z satisfying conditions in Lemma 2 and Hoc,{Y'\Z) = k. Finally, 
observe that 6 = {x\ 2 )>t{z 2 )} ~ ymax(22) means that the distribution W|Z = ^2 is uniform on 

{x : D'{x.Z 2 ) > t{z 2 )]. In turn, if (5 = ymax(^i) - #{x:D'(x,2i)^t(zi)} distribution Y'\Z = zi 

is uniform on {x : D'(x,2:i) ^ t{zi)}. □ □ 


F Proof of Claim 5, Lemma 3 

Proof. We check that lim5_5.o h{s) = ai and thus the function h is continuous on the interval [0,1]. 
This means that h attains its minimum at some point s = sq. There is nothing to prove if sq G {0,1}. 
Suppose that sq G (0,1). Then we must have = 0. The hrst derivative of the function h is 

given by the following formula 


dh 

ds 


si{a + s)(l — s)^ ^ + a ((1 — sY — l) 


—a + (1 — sY ^ (a(l — s) + (a + s)£s) 


(45) 


Therefore for s = sq we obtain (1 — sq)^ ^ 


a(l-so)+(a+^*o)^«0 


and hence 


/i(so) = (1 - (1 - So) • (1 - so)^ ^) (l + asQ 
_ (g + sq)^^ 

a(l - So) + (g + so)^so 


(46) 


Note that the last expression is increasing with respect to i and that from the assumption we have 


£ > Using this we obtain 


a+so 


h{so) ^ 


(g + so)(l + g) 


— 1 H" fl 


g(l - So) + (1 + g)so 
which completes the proof. □ 

The lemma follows now immediately by combining (33) and the last claim. 


(47) 


□ 


□ 

□ 
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G Proof of Lemma 4 


of Lemma. It is easy to see that lim^_>o+ 9{d) = We have 

dg{d) (l-d)^-i(d(£-l) + l)-l 


dd 


d? 


(48) 


Using the inequality 1 — d ^ e we obtain 


dg{d) ^ e (d(^ - 1) + 1) - 1 ^ ^ 


dd 


d? 


Where the second inequality follows from the inequality e^ ^ 1 + s applied for s = d{i — 1). This 
proves (a). The second derivative is given by 


d‘^g{d) _ (1 - dy-^ (2 + 2d{i - 2) + d‘^{{i - 2^ + £ - 2)) - 2 


ddf 


d? 


(49) 


Using 1 —d ^ e “ and applying the inequality e^ ^ 1 + s+^s^, which holds for s ^ 0, for s = d(f'—1) 
we obtain 

d‘^g{d) _ (1 - d)^-2 (2 + 2d{£ - 2) + df{{£ - 2)^ + £ - 2)) - 2 


dd? 


d3 


- 


- 


(1 - d)^-^ (2 + 2d{i -l) + d‘^{i- 1)2) - 2 


d3 


,-d{e-i) (2 + 2d{i - 1) + d2(£ - 1)2) - 2 


d3 


2-2 


which proves (b). Finally, note that by convexity we have 


9{d2) - 9{di) ^ (d2 - di) 


dg{d) 


dd 


d — d\ 


Since g{d) > 0 and ^ ^ /did) we can rewrite this 


as 




9idi) 


dd 


(50) 


(51) 


(52) 


d=di 


Note that the function d —)• ln 5 r(d) is convex, as the composition of the convex function g{-) and 
the convex increasing function In(-). Therefore, 


dlng{d) dlng{d) 


dd " dd 

Combining the last two inequalities yields 


1 


d=0 


g{d2) - gidi) 

-T—-> —- • (d2 — di), d2 — di > 0. 

g{di) 2 


which completes the proof of (c). 


□ 


(53) 

(54) 

□ 
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